A sequential Monte Carlo EM approach to the transcription factor binding site identification problem
نویسندگان
چکیده
MOTIVATION A significant and stubbornly intractable problem in genome sequence analysis has been the de novo identification of transcription factor binding sites in promoter regions. Although theoretically pleasing, probabilistic methods have faced difficulties due to model mismatch and the nature of the biological sequence. These problems result in inference in a high dimensional, highly multimodal space, and consequently often display only local convergence and hence unsatisfactory performance. ALGORITHM In this article, we derive and demonstrate a novel method utilizing a sequential Monte Carlo-based expectation-maximization (EM) optimization to improve performance in this scenario. The Monte Carlo element should increase the robustness of the algorithm compared to classical EM. Furthermore, the parallel nature of the sequential Monte Carlo algorithm should be more robust than Gibbs sampling approaches to multimodality problems. RESULTS We demonstrate the superior performance of this algorithm on both semi-synthetic and real data from Escherichia coli. AVAILABILITY http://sigproc-eng.cam.ac.uk/ approximately ej230/smc_em_tfbsid.tar.gz
منابع مشابه
A Novel Transcription Factor Binding Sites Prediction Approach
Transcription factors (TFs) and their DNA binding motifs, called transcription factor binding sites (TFBSs) play important roles in most biological processes. However, the list for TFBSs still remains largely unknown. Machine learning approaches have been intensively applied to predict TFBSs. In this paper, a novel prediction approach has been presented based on Markov Chain Monte Carlo (MCMC) ...
متن کاملQPS -- quadratic programming sampler, a motif finder using biophysical modeling
We present a Markov chain Monte Carlo algorithm for local alignments of nucleotide sequences aiming to infer putative transcription factor binding sites, referred to as the quadratic programming sampler. The new motif finder incorporates detailed biophysical modeling of the transcription factor binding site recognition which arises an intrinsic threshold discriminating putative binding sites fr...
متن کاملThe Use of Monte-Carlo Simulations in Seismic Hazard Analysis in Tehran and Surrounding Areas
Probabilistic seismic hazard analysis is a technique for estimating the annual rate of exceedance of a specified ground motion at a site due to the known and suspected earthquake sources. A Monte-Carlo approach is utilized to estimate the seismic hazard at a site. This method uses numerous resampling of an earthquake catalog to construct synthetic catalogs to evaluate the ground motion hazard a...
متن کاملModeling within-motif dependence for transcription factor binding site predictions
MOTIVATION The position-specific weight matrix (PWM) model, which assumes that each position in the DNA site contributes independently to the overall protein-DNA interaction, has been the primary means to describe transcription factor binding site motifs. Recent biological experiments, however, suggest that there exists interdependence among positions in the binding sites. In order to exploit t...
متن کاملAdvanced Interacting Sequential Monte Carlo Sampling for Inverse Scattering
The following electromagnetism (EM) inverse problem is addressed. It consists in estimating local radioelectric properties of materials recovering an object from global EM scattering measurements, at various incidences and wave frequencies. This large scale ill-posed inverse problem is explored by an intensive exploitation of an efficient 2D Maxwell solver, distributed on high performance compu...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Bioinformatics
دوره 23 11 شماره
صفحات -
تاریخ انتشار 2007